Here’s a look at the palmer penguins dataset:
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex <fct> male, female, female, NA, female, male, female, male…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
The steps of data science are:
We don’t need to do any real cleaning for this data set. But we can show some pretty pictures…
Here, we see that male and female penguins have distinct body mass distributions for each species of penguin. Don’t believe your eyes??
Here are some stats:
## # A tibble: 6 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 3369. 36.2 93.0 2.55e-237
## 2 sexmale 675. 51.2 13.2 4.37e- 32
## 3 speciesChinstrap 158. 64.2 2.47 1.42e- 2
## 4 speciesGentoo 1311. 54.4 24.1 1.92e- 74
## 5 sexmale:speciesChinstrap -263. 90.8 -2.89 4.06e- 3
## 6 sexmale:speciesGentoo 130. 76.4 1.71 8.89e- 2
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 3368.8356 | 36.21222 | 93.030365 | 0.0000000 |
| sexmale | 674.6575 | 51.21181 | 13.173867 | 0.0000000 |
| speciesChinstrap | 158.3703 | 64.24029 | 2.465279 | 0.0142039 |
| speciesGentoo | 1310.9058 | 54.42228 | 24.087666 | 0.0000000 |
| sexmale:speciesChinstrap | -262.8928 | 90.84950 | -2.893718 | 0.0040627 |
| sexmale:speciesGentoo | 130.4372 | 76.43559 | 1.706498 | 0.0888649 |
The equation for our model is:
\[ E( \operatorname{body_mass_g} ) = \alpha + \beta_{1}(\operatorname{sex}_{\operatorname{male}}) + \beta_{2}(\operatorname{species}_{\operatorname{Chinstrap}}) + \beta_{3}(\operatorname{species}_{\operatorname{Gentoo}}) + \beta_{4}(\operatorname{sex}_{\operatorname{male}} \times \operatorname{species}_{\operatorname{Chinstrap}}) + \beta_{5}(\operatorname{sex}_{\operatorname{male}} \times \operatorname{species}_{\operatorname{Gentoo}}) \]